Learning Statistical and Geometric Models from Microarray Gene Expression Data

نویسنده

  • Yitan Zhu
چکیده

Analysis of microarray gene expression data is important for disease study at the molecular and genomic level. Computational data modeling and analysis are essential for extracting meaningful and specific information from noisy, high-throughput, and large-scale microarray gene expression data. In this dissertation, we propose and develop innovative data modeling and analysis methods for learning statistical and geometric models from gene expression data and subsequently discover data structure and information associated with disease mechanisms. To provide a high-level overview of gene expression data for easy and insightful understanding of data structure relevant to the physiological event of interest, we propose a novel statistical data clustering and visualization algorithm that is comprehensive and effective for multiple clustering tasks and that overcomes some of the major limitations associated with existing clustering methods. The proposed clustering and visualization algorithm performs progressive, divisive hierarchical clustering and visualization, supported by hierarchical statistical modeling, supervised/unsupervised informative gene/feature selection, supervised/unsupervised data visualization, and user/prior knowledge guidance through humandata interactions, to discover cluster structure within complex, high-dimensional gene expression data. Applications to muscular dystrophy, muscle regeneration, and cancer data demonstrated its abilities to identify functionally enriched (co-regulated) gene groups, detect/validate disease types/subtypes, and discover the pathological relationship among multiple disease types reflected by gene expression profiles. For the purpose of selecting suitable clustering algorithm(s) for gene expression data analysis and validating the advantage of our proposed clustering algorithm, we design an

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Diagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data

Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...

متن کامل

Microarray analysis of gene expression patterns in Arabidopsis seedlings under trehalose, sucrose and sorbitol treatment

Trehalose is the non-reducing alpha-alpha-1, 1-linked glucose disaccharide. The biosynthesisprecursor of trehalose, trehalose-6-phosphate (T6P), is essential for plant development, growth,carbon utilization and alters photosynthetic capacity but its mode of action is not understood. In thecurrent research, 6 days old seedlings of Arabidopsis thaliana (Columbia ecotype) were grown inliquid cultu...

متن کامل

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009